Search | Global Index Medicus

1.

Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data

Yuna LEE; Kiejung PARK; Insong KOH.

Genomics & Informatics ; : e40-2019.

Article in English | WPRIM | ID: wpr-830121

ABSTRACT

While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.

2.

Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data

Yuna LEE; Kiejung PARK; Insong KOH.

Genomics & Informatics ; : 40-2019.

Article in English | WPRIM | ID: wpr-785801

ABSTRACT

While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.

Subject(s)

Humans , Genome , Genome, Human , Nanopores , Polymorphism, Single Nucleotide

3.

Identification of Ethnically Specific Genetic Variations in Pan-Asian Ethnos

Jin-Ok YANG; Sohyun HWANG; Woo-Yeon KIM; Seong-Jin PARK; Sang-Cheol KIM; Kiejung PARK; Byungwook LEE.

Genomics & Informatics ; : 42-47, 2014.

Article in English | WPRIM | ID: wpr-187159

ABSTRACT

Asian populations contain a variety of ethnic groups that have ethnically specific genetic differences. Ethnic variants may be highly relevant in disease and human differentiation studies. Here, we identified ethnically specific variants and then investigated their distribution across Asian ethnic groups. We obtained 58,960 Pan-Asian single nucleotide polymorphisms of 1,953 individuals from 72 ethnic groups of 11 Asian countries. We selected 9,306 ethnic variant single nucleotide polymorphisms (ESNPs) and 5,167 ethnic variant copy number polymorphisms (ECNPs) using the nearest shrunken centroid method. We analyzed ESNPs and ECNPs in 3 hierarchical levels: superpopulation, subpopulation, and ethnic population. We also identified ESNP- and ECNP-related genes and their features. This study represents the first attempt to identify Asian ESNP and ECNP markers, which can be used to identify genetic differences and predict disease susceptibility and drug effectiveness in Asian ethnic populations.

Subject(s)

Humans , Asian People , Classification , Disease Susceptibility , DNA Copy Number Variations , Ethnicity , Genetic Variation , Genotype , Polymorphism, Single Nucleotide

4.

ManBIF: a Program for Mining and Managing Biobank Impact Factor Data

Ki-Jin YU; Jungmin NAM; Yun HER; Minseock CHU; Hyungseok SEO; Junwoo KIM; Jaepil JEON; Hyekyung PARK; Kiejung PARK.

Genomics & Informatics ; : 37-38, 2011.

Article in English | WPRIM | ID: wpr-171924

ABSTRACT

Biobank Impact Factor (BIF), which is a very effective criterion to evaluate the activity of biobanks, can be estimated by the citation information of biobanks from scientific papers. We have developed a program, ManBIF, to investigate the citation information from PDF files in the literature. The program manages a dictionary for expressions to represent biobanks and their resources, mines the citation information by converting PDF files to text files and searching with a dictionary, and produces a statistical report file. It can be used as an important tool by biobanks.

Subject(s)

Mining

5.

PromoterWizard: An Integrated Promoter Prediction Program Using Hybrid Methods

Kiejung PARK; Ki-Bong KIM.

Genomics & Informatics ; : 194-196, 2011.

Article in English | WPRIM | ID: wpr-73129

ABSTRACT

Promoter prediction is a very important problem and is closely related to the main problems of bioinformatics such as the construction of gene regulatory networks and gene function annotation. In this context, we developed an integrated promoter prediction program using hybrid methods, PromoterWizard, which can be employed to detect the core promoter region and the transcription start site (TSS) in vertebrate genomic DNA sequences, an issue of obvious importance for genome annotation efforts. PromoterWizard consists of three main modules and two auxiliary modules. The three main modules include CDRM (Composite Dependency Reflecting Model) module, SVM (Support Vector Machine) module, and ICM (Interpolated Context Model) module. The two auxiliary modules are CpG Island Detector and GCPlot that may contribute to improving the predictive accuracy of the three main modules and facilitating human curator to decide on the final annotation.

Subject(s)

Humans , Base Sequence , Chimera , Computational Biology , CpG Islands , Dependency, Psychological , Gene Regulatory Networks , Genome , Promoter Regions, Genetic , Transcription Initiation Site , Vertebrates

6.

BioStore: A Repository System for Registering and Distributing Public Biology Databases

Hongseok TAE; Jeong-Min HAN; Bu-Young AHN; Kiejung PARK.

Genomics & Informatics ; : 49-51, 2009.

Article in English | WPRIM | ID: wpr-76618

ABSTRACT

Although abundant biology data have been accumulated in public biology databases, such as GenBank and PIR, few easy-interface services are provided for users to access or update them. We have developed a system, named BioStore, that is composed of several programs to aid users to not only access public data but also share their own data easily. The service can be used for maintaining a local database as a repository of raw data files of several public databases and distributing the data files to other users. Currently, BioStore manipulates major bio-databases and will expand to include more databases and more useful interfaces.

Subject(s)

Biology , Databases, Nucleic Acid , Formycins , Ribonucleotides , Information Storage and Retrieval

7.

WinBioDBs: A Windows-based Integrated Program for Manipulating Major Biological Databases

Hyeweon NAM; Jin-Ho LEE; Kiejung PARK.

Genomics & Informatics ; : 175-177, 2009.

Article in English | WPRIM | ID: wpr-10787

ABSTRACT

We have developed WinBioDBs with Windows interfaces, which include importing modules and searching interfaces for 10 major public databases such as GenBank, PIR, SwissProt, Pathway, EPD, ENZYME, REBASE, Prosite, Blocks, and Pfam. User databases can be constructed with searching results of queries and their entries can be edited. The program is a stand-alone database searching program on Windows PC. Database update features are supported by importing raw database files and indexing after downloading them. Users can adjust their own searching environments and report format and construct their own projects consisting of a combination of a local databases. WinBioDBs are implemented with VC++ and its database is based on MySQL.

Subject(s)

Abstracting and Indexing , Databases, Nucleic Acid , Databases, Protein

8.

COCAW: A Genome-wide Pattern Search System for Designing Microbial Probes

Seunghee RYU; Kiejung PARK; Dohoon LEE; Cheol-Min KIM.

Genomics & Informatics ; : 178-180, 2009.

Article in English | WPRIM | ID: wpr-10786

ABSTRACT

A few bioinformatics tools have been used to find out conserved regions as probes. We have developed a system based on a heuristic method with web interfaces to find out conserved regions against microbial genomes. The system runs in real time by using relative entropy in limited narrow regions and detecting similar regions between pair regions with local alignment. The system could be useful to find out conserved regions as genome-wide scale.

Subject(s)

Computational Biology , Entropy , Genome

9.

A Bio-database Management System for the Monitoring and Automatic FTP of Public Databases

Hongseok TAE; Jeong-Min HAN; Bu-Young AHN; Kiejung PARK.

Genomics & Informatics ; : 95-97, 2008.

Article in English | WPRIM | ID: wpr-110088

ABSTRACT

Many bioinformatics sites have managed local bio-databases, including major databases such as GenBank and PIR with update load. We have developed several programs to monitor the update status of these databases and to FTP them automatically. These programs can be used for maintaining local bio-databases as recent versions and providing up-to-date databases through FTP sites. Currently, the program serves major bio-databases and will extend to accommodate many more bio-databases.

Subject(s)

Computational Biology , Databases, Nucleic Acid , Formycins , Organothiophosphorus Compounds , Ribonucleotides

10.

Computational Approach for the Analysis of Post-PKS Glycosylation Step

Ki-Bong KIM; Kiejung PARK.

Genomics & Informatics ; : 223-226, 2008.

Article in English | WPRIM | ID: wpr-59841

ABSTRACT

We introduce a computational approach for analysis of glycosylation in Post-PKS tailoring steps. It is a computational method to predict the deoxysugar biosynthesis unit pathway and the substrate specificity of glycosyltransferases involved in the glycosylation of polyketides. In this work, a directed and weighted graph is introduced to represent and predict the deoxysugar biosynthesis unit pathway. In addition, a homology based gene clustering method is used to predict the substrate specificity of glycosyltransferases. It is useful for the rational design of polyketide natural products, which leads to in silico drug discovery.

Subject(s)

Biological Factors , Computer Simulation , Glycosylation , Glycosyltransferases , Polyketides , Substrate Specificity

11.

Computational Approach for Biosynthetic Engineering of Post-PKS Tailoring Enzymes

Ki-Bong KIM; Kiejung PARK.

Genomics & Informatics ; : 227-230, 2008.

Article in English | WPRIM | ID: wpr-59840

ABSTRACT

Compounds of polyketide origin possess a wealth of pharmacological effects, including antibacterial, antifungal, antiparasitic, anticancer and immunosuppressive activities. Many of these compounds and their semisynthetic derivatives are used today in the clinic. Most of the gene clusters encoding commercially important drugs have also been cloned and sequenced and their biosynthetic mechanisms studied in great detail. The area of biosynthetic engineering of the enzymes involved in polyketide biosynthesis has recently advanced and been transferred into the industrial arena. In this work, we introduce a computational system to provide the user with a wealth of information that can be utilized for biosynthetic engineering of enzymes involved in post-PKS tailoring steps. Post-PKS tailoring steps are necessary to add functional groups essential for the biological activity and are therefore important in polyketide biosynthesis.

Subject(s)

Clone Cells , Multigene Family

12.

RGISS: Rice (Oryza sativa L. ssp. japonica) Genome Information Service System

Daesang LEE; Hwajung SEO; Jang-Ho HAHN; Eun-Bae KONG; Kiejung PARK.

Genomics & Informatics ; : 194-195, 2007.

Article in English | WPRIM | ID: wpr-21114

ABSTRACT

We have constructed the Rice Genome Information Service System (RGISS), which is an information service system of the Oryza sativa L. ssp. japonica (rice) genome, using the released version of rice Build 3.0 pseudomolecules based on the Ensembl architecture. The nonredundant library, composed of 3,360 clones of BACs, PACs, and fosmids, was used to construct supercontigs. RGISS contains 50,717 annotated genes from GenBank, 56,161 predicted genes from FgeneSH, and information on 9,587 markers, which includes STS, SSR, and EST-based RFLP. The 20,180 ESTs sequenced by the Korea National Institute of Agricultural Biotechnology (NIAB) were aligned and mapped into 168,792 exons. By gene ontology analysis, the classified protein numbers in the rice genome were 6158, 4531, and 12,364 proteins, which were mapped to molecular function, cellular component, and biological process, respectively.

Subject(s)

Biological Phenomena , Biotechnology , Clone Cells , Databases, Nucleic Acid , Exons , Expressed Sequence Tags , Gene Ontology , Genome , Information Services , Korea , Polymorphism, Restriction Fragment Length , Oryza

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL